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PARALLEL PROCESSOR, PARALLEL PROCESSING METHOD. AND 

STORING MEDIUM 

BACKGROUND OF THE INVENTION 
5 1. Field of the Invention 

The present invention relates to a parallel 
processor, parallel processing method, and storing medium 
for storing the routine of the method in a computer 
4f readable format . 

10 2. Description of the Related Art 

I""* A single processor running on Unix® or 

X another operating system (OS) must function to manage the 

progress in a plurality of programs simultaneously 
existing in a local memory when executing programs under 
; j| 15 a multi-tasking environment. Inv such a function, use is 

made of the concept of a "process" as opposed to the term 
"program". A "process" is an independent program in 
execution in a memory space (user memory space) which 
that program can independently access set in a local 
20 memory. Execution of a program means running of a 

process, while termination of the program means deletion 
of the process. Also, a process is capable of running and 
deleting other processes and communicating with other 
processes . 

25 Since there is one central processing unit 
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(CPU) in a single processor, a maximum of one process can 
be run at any one time. Therefore, in a single processor, 
the user memory space is simultaneously assigned to a 
plurality of independent programs and the plurality of 
5 programs are alternately executed in a time sharing mode 
to alternately run a plurality of processes and thereby 
realize a multi- tasking environment. 

At this time, when one process is in a 
running state, the other processes are in a waiting 
10 state. 

In the above multi-tasking environment, a 
plurality of processes pass messages among each other as 
described below. 

Namely, in a single processor, as explained 

15 above, since there is a maximum of one process in a 

running state at any one time, when one process sending a 
message is in a running state, another process to receive 
the message is in a waiting state. Therefore, the running 
state process sending the message calls up a process 

20 management task in a kernel of the OS and writes to send 
the message in a table in a memory which stores the 
previous running state of the process to receive the 
message immediately before it shifted to the waiting 
state n or mally — Lable scoring * context of threads). Then, 

25 when the process to receive the message next shifts the 
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running state, it learns that the message was received by 
referring to the table and performs processing in 
accordance therewith. On the other hand, for example, 
when a process is one which proceeds to the next 
5 processing conditional on receiving a message and judges 
that no message was received when shifting to the running 
state and referring to the table, that process enters the 
waiting state. That process shifts to the running state 
only after confirming the receipt of a message. 

10 On the other hand, for example, in a 

multiprocessor which is comprised of a plurality of CPUs 
connected via a common bus and executes a plurality of 
mutually independent programs in parallel, usually a 
maximum of one process is in a running state at one CPU 

15 at any one time, but a plurality of processes can 

simultaneously be in the running state at different CPUs. 

Communication between processes is achieved 
for example by a sending side process passing a message 
over the common bus and an arbiter monitoring the common 

20 bus notifying that message to the receiving side process 

based on instruction codes indicated in the user program 

( ■application progr aaH - Therefore, to pass a message 
A* 

between processes, it is necessary that both the message 
sending side process and receiving side process be in the 
2 5 running state. 
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In this way, in a multiprocessor, usually 
messages are not passed using the process management task 
as in the above explained single processor. That is, 
there is no process management task in a multiprocessor. 
5 In a multiprocessor, however, when it is 

necessary to synchronize a plurality of processes 
operating in parallel, the synchronization is realized by 
using the above message passing. 

Below, a method of synchronizing processes in 
10 a multiprocessor of the related art will be explained. 

First , the configuration of a general 
multiprocessor will be explained. 

Figure 5 is a view of the configuration of a 
general multiprocessor. 
15 As shown in Fig. 5, a multiprocessor 1 is 

configured by connecting, for example, four processor 
elements 1^ to 11 4 via a common bus 17. The common bus 
17 is connected to a common memory 15 and an arbiter 16. 

Here, the processor element 11 L comprises, 
20 for example as shown in Fig. 6, a processor core 31 and a 
local memory 32, stores a user program read from the 
common memory 15 via the common bus 17 in the local 
memory 32, and successively supplies instruction codes of 
the user program stored in the local memory 32 to the 
2 5 processor core 31 for execution. The processor elements 
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11 2 to 11 4 have the same configuration, for example, as 
the processor element llx- 

The arbiter 16 monitors execution states 
(such as the load of the processing) of the processor 
5 elements 11 1 to 11 4 and assigns software resources stored 
in the common memory 15 to the processor elements 11 1 to 
11 4 , that is, the hardware resources. Specifically, the 
arbiter 16 reads the user programs stored in the common 
memory 15 into the local memories 32 shown in Fig. 6 of 
10 the processor elements ll x to 11 4 . 

The arbiter 16, for example as shown in Fig. 
7 , reads a main program Prg_A and subprograms Prg_B , 
Prg_C, Prg_D, and Prg_E as user programs into the local 
memories 32 of the processor elements ll x to 11 4 
15 indicated by the arrows in Fig. 7 at the same time or at 
different times. 

Next, a method of synchronizing among 

programs - ( prooc s s e sj — e#~ the related art in the 
A- 

multiprocessor 1 shown in Fig. 5 will be explained. 

20 First, the main program Prg_A stored in a common memory 
15 is read into the local memory 32 of the processor 
element ll! by the arbiter 16, then, as shown in Fig. 8, 
instruction codes written in the main program Prg_A are 
successively executed in the processor element ll x . 

25 Next, when the instruction code " gen ( Pr g_B ) " 



is executed ±n the .processor element a message 

indicating that^jLs notified to the arbiter 16 via the 
common bus 17. Then, the subprogram Prg_B stored in the 
common memory 15 is read into the local memory 32 of the 
5 processor element 11 2 by the arbiter 16 based on the 
execution states of the processor elements ll x to 11 4 , 
and instruction codes written in the subprogram Prg_B are 
successively executed in the processor element 11 2 . 

Next, when an instruction code " gen ( Prg_C ) " 

10 is executed in the processor element 11 1# a message 

indicating that is notified to the arbiter 16 via the 
common bus 17. Then, the subprogram Prg__C stored in the 
common memory 15 is read into the local memory 32 of the 
processor element 11 3 by the arbiter 16 based on the 

15 execution states of the processor elements ll x to 11 4 , 

and instruction codes written in the subprogram Prg_C are 
successively executed in the processor element 11 3 . 

Next, when an instruction code " gen ( Pr g_D ) " 
is executed in the processor element ll lf a message 

20 indicating that is notified to the arbiter 16 via the 

common bus 17. The subprogram Prg_D stored in the common 
memory 15 is then read into the local memory 32 of the 
processor element 11 4 by the arbiter 16 based on the 
execution states of the processor elements 11 L to 11 4 , 

25 and instruction codes written in the subprogram Prg_D are 
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successively executed in the processor element 11 4 . 

Next , when an instruction code "wait ( Prg_D ) " 



of the processor element 11 L enters a synchronization 
5 waiting state* 

Next, when the last instruction code "end" of 
the subprogram Prg_D is executed in the processor element 
11 4 , a message indicating the completion of the 
subprogram Prg_D is notified to the processor element ll x 
10 via, for example, the arbiter 16, As a result, the 
processor element 11^ releases the synchronization 
waiting state and executes the next instruction code. 

Next, when an instruction code "wait (Prg_C) " 
is executed in the processor element ll lf the processing 
15 of the processor element 11 1 enters a synchronization 
waiting state . 

Next, when the last instruction code "end" of 
the subprogram Prg_C is executed in the processor element 
11 3 , a message indicating the completion of the 
20 subprogram Prg_C is notified to the processor element 11 L 
via, for example, the arbiter 16. As a result, the 
processor element ll x releases the synchronization 
waiting state and executes the next instruction code. 

Next, when an instruction code "gen(Prg_E) " 
25 is executed in the processor element ll r , a message 



is executed in the processor element 11 
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the processing 
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indicating that^jLs notified to the arbiter 16 via the 
common bus 17. Then, the subprogram Prg__E stored in the 
common memory 15 is read into the local memory of, for 
example, the processor element 11 4 by the arbiter 16 
5 based on the execution states of the processor elements 
ll x to 11 4 , and instruction codes written in the 
subprogram Prg_C are successively executed in the 
processor element 11 4 . 

Next, when the instruction code " gen ( Prg_D ) " 

10 is executed again in the processor element ll x , a message 
indicating that^Ls notified to the arbiter 16 via the 
common bus 17. Then, the subprogram Prg_D stored in the 
common memory 15 is read into the local memory 32 of the 
processor element 11 3 by the arbiter 16 based on the 

15 execution states of the processor elements ll x to 11 4 , 

and instruction codes written in the subprogram Prg_D are 
successively executed in the processor element 11 3 . 

Summarizing the problems to be solved by the 
invention, as explained above, in the multiprocessor 1 of 

20 the related art, the synchronization between the programs 
^processesjl executed in different processor elements is a 
simple one of release of a synchronization waiting state 
caused by execution of an instruction code "wait" in one 
processor element based on execution of an instruction 

2 5 code "end" indicating completion of execution of a 



program in another processor element. 

Namely, a synchronization waiting state of a 
processor element based on one program cannot be released 
until the completion of execution of a program in another 
5 processor element* Accordingly, there is a disadvantage 
that a variety of forms of synchronization among 
different programs executed at different processor 
elements such as synchronization among instruction codes 
written in the middle of programs cannot be realized. 

10 Also, in the above embodiment, the arbiter 16 

cannot for example determine which subprogram will be 
called up in the future by a main program Prg_A shown in 
Fig. 8 during execution of the main program Prg__A by the 
processor element ll x . 

15 Therefore, as shown in Fig. 8, there is a 

possibility that the subprogram Prg_D will end up being 
assigned to different processor elements 11 3 and 11 4 by 
the arbiter 16 between a first execution and a second 
execution of the instruction code n gen(Prg__D) " in a 

20 processor element ll^ In such a case, although the 

subprogram Prg_D is executed again after a relatively 
short interval, it is necessary to read the subprogram 
Prg_D from the common memory 15 to the processor element 
11 3 at the time of the second execution, which results in 

25 a longer waiting time of the processor element 11 3 . 
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Such a situation frequently occurs especially 
when the memory capacity of the local memory shown in 
Fig. 6 and the size of the program to be read are of the 
same order and causes a drastic decline of performance of 
5 the multiprocessor 1 . 

SUMMARY OF THE INVENTION 

An object of the present invention is to provide a 
parallel processor and a parallel processing method 

10 enabling various forms of synchronization among programs 
executed in parallel and a storing medium for storing the 
routine of the method in a computer -readable format. 

Another object of the present invention is to 
provide a parallel processor which can shorten a waiting 

15 time of a processor element caused by transfer of a user 
program between a local memory of the processor element 
and a common memory. 

To achieve the above objects, according to a first 
aspect of the present invention, there is provided a 

20 parallel processor comprising a plurality of processing 
means which perform mutually parallel processing on the 
basis of instructions written in programs and are capable 
of communicating with each other via a common bus, 
wherein one of the processing means suspends processing 

2 5 based on a program and enters a waiting state when 
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executing a wait instruction and releases the waiting 
state and restarts the processing based on the program 
based on execution of a wait release instruction by 
another processing means and the other processing means 
5 executes a next instruction without suspending processing 
after it executes the wait release instruction. 

In the parallel processor according to the first 
aspect of the present invention, synchronization is 
established between one processing means and another 

10 processing means at the instruction level while both^ 

executing programs by using a wait instruction and a wait 
release instruction in the programs. Namely, it is 
possible to synchronize among programs without having to 
wait for completion of execution of one program. 

15 Preferably, the other processing means executes a 

synchronization wait instruction to enter a 
synchronization waiting state and releases the 
synchronization waiting state based on execution of the 
wait instruction corresponding to the synchronization 

20 wait instruction or execution of a program end 

instruction indicating an end of a program by the one 
processing means. 

Due to this, execution of a wait instruction in one 
processing means prior to execution of a wait release 

25 instruction in another precessing means can be prevented. 



• # 



- 12 - 

According to a second aspect of the present 
invention, there is provided a parallel processor 
comprising a plurality of processing means which perform 
mutually parallel processing on the basis of instructions 
5 written in programs and are capable of communicating with 
each other via a common bus , wherein one of the 
processing means suspends processing based on a program 
and enters a waiting state when executing a wait 
instruction and releases the waiting state and restarts 

10 the processing based on the program based on execution of 
a wait release instruction by another processing means 
and the other processing means enters a synchronization 
waiting state when executing the wait release instruction 
until the one processing means enters the waiting state 

15 when that one processing means is not in the waiting 
state . 

Namely, in the parallel processor according to the 
second aspect of the present invention, synchronization 
is established between one processing means and another 

20 processing means at an instruction level by using a wait 
instruction and wait release instruction in the programs. 
Namely, it is possible to synchronize among programs 
without waiting for completion of execution of one 
program. Also, even if a synchronization wait instruction 

25 corresponding to a wait release instruction is not 
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written in a program to be processed by another 
processing means, a synchronization waiting state is 
maintained until the other processing means enters the 
waiting state when the one processing means is not in a 
5 waiting state. 

According to a third aspect of the present 
invention, there is provided a parallel processor 
comprising a plurality of processing means which perform 
mutually parallel processing on the basis of instructions 

10 written in programs and are capable of communicating with 
each other via a common bus, comprising a first storage 
means connected to the common bus for storing the 
programs and second storage means provided corresponding 
to the plurality of processing means, reading from the 

15 first storage means programs to be executed by 

corresponding processing means via the common bus, 
supplying the processing means with instructions written 
in the read programs, and having faster access speeds 
than the first storage means; one of the processing means 

20 suspending processing based on a program and entering a 
waiting state when executing a wait instruction and 
releasing the waiting state and restarting the processing 
based on the program based on execution of a wait release 
instruction by another processing means; a second storage 

2 5 means continuing to store a program supplied to its 
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corresponding processing mean before entering the waiting 
state when the processing means is in the waiting state* 

In the parallel processor according to the third 
aspect of the present invention, if one processing means 
5 suspends its processing based on the program and enters a 
waiting state when executing a wait instruction and 
releases the waiting state and resumes the processing 
based on the program based on execution of a wait release 
instruction by another processing means, the program 

10 supplied to the one processing means is continuously 

stored in the second storage means corresponding to the 
one processing means. Namely, when restarting execution 
of the program, it is not necessary to read the program 
from the first storage means to the second storage means. 

15 According to a fourth aspect of the present 

invention, there is provided a parallel processing method 
for performing at least first processing and second 
processing in parallel based on instructions written in 
programs, wherein the first processing suspends 

20 processing based on a program and enters a waiting state 
by executing a wait instruction and releases the waiting 
state and resumes processing based on the program based 
on execution of a wait release instruction in the second 
processing and the second processing executes a next 

2 5 instruction without suspending its processing after 
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executing the wait release instruction. 

According to a fifth aspect of the present 
invention, there is provided a parallel processing method 
for performing at least first processing and second 
5 processing in parallel based on instructions written in 
programs, wherein the first processing suspends 
processing based on a program and enters a waiting state 
by executing a wait instruction and releases the waiting 
state and resumes processing based on the program based 

10 on execution of a wait release instruction in the second 
processing and the second processing enters a 
synchronization waiting state by executing the wait 
release instruction until the first processing enters the 
waiting state when the first processing is not in the 

15 waiting state . 

According to a sixth aspect of the present 
invention, there is provided a storage medium for storing 
in a computer -readable format routines of first 
processing and second processing to be performed in 

20 parallel based on instructions written in programs, 

wherein the first processing is processing which suspends 
processing based on a program and enters a waiting state 
by executing a wait instruction and releases the waiting 
state and resumes processing based on the program based 

25 on execution of a wait release instruction in the second 
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processing and the second processing is processing which 
executes a next instruction without suspending its 
processing after executing the wait release instruction. 
According to a seventh aspect of the present 
5 invention, there is provided a storage medium for storing 
in a computer-readable format routines of first 
processing and second processing to be performed in 
parallel based on instructions written in programs, 
wherein the first processing is processing which suspends 

10 processing based on a program and enters a waiting state 
by executing a wait instruction and releases the waiting 
state and resumes processing based on the program based 
on execution of a wait release instruction in the second 
processing and the second processing is processing which 

15 enters a synchronization waiting state by executing the 
wait release instruction until the first processing 
enters the waiting state when the first processing is not 
in the waiting state. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

These and other objects and features of the present 
invention will become clearer from the following 
description of the preferred embodiments given with 
reference to the accompanying drawings, in which: 

25 Fig. 1 is a view of the configuration of a 
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multiprocessor according to a first embodiment of the 
present invention ; 

Fig. 2 is a view for explaining the operation of 
the multiprocessor according to the first embodiment of 
5 the present invention shown in Fig. 1; 

Fig. 3 is a view for explaining the operation of a 
multiprocessor according to a second embodiment of the 
present invent ion ; 

Fig. 4 is a view for explaining another operation 
10 of the multiprocessor according to the second embodiment 
of the present invention; 

Fig . 5 is a view of the configuration of a general 
multiprocessor; 

Fig. 6 is a view of the configuration inside a 
15 processor element shown in Figs. 1 and 5; 

Fig. 7 is a view for explaining assignment of 
programs to processor elements of the multiprocessor 
shown in Fig. 5; and 

Fig. 8 is a view for explaining the operation of 
20 the general multiprocessor shown in Fig. 5. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Below, preferred embodiments of a multiprocessor 
according the present invention will be described with 
2 5 reference to the accompanying drawings. 
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First Embodiment 

Figure 1 is a view of the configuration of a 
multiprocessor 51 according to a first embodiment of the 
present invention . 

As shown in Fig. 1, a multiprocessor 51 comprises, 
for example, a common bus 17, processor elements 61 1 to 
61 4 serving as processing means and second storage means, 
a common memory 65 serving as a first storage means, and 
an arbiter 66 serving as a program assigning means. 

The multiprocessor 51 adopts a bus -connected 
architecture where the processor elements 61 x to 61 4 , the 
common memory 65, and the arbiter 66 are mutually 
connected via, for example, the common bus 17. These 
components are built in a single semiconductor chip. 

(Outline of Components) 

Points in common and of difference between the 
multiprocessor 51 and the multiprocessor 1 shown in Fig. 
5 will be explained first. 

The common bus 17 is the same as the common bus 17 
in the multiprocessor 1 of the related art shown in Fig. 
5. 

Also, the processor elements 61 L to 61 4 are the same 
as the processor elements ll x to 11 4 shown in Fig. 5 in 
terms of the hardware configuration of each having a 
processor core 31 and a local memory 32 as shown in Fig. 
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j£, however, / operate differently along with execution of 

instruction codes because the processor elements 61 L to 

61 4 execute new instruction codes for establishing 

synchronization between programs (proocgocs -fr not provided 

5 in the multiprocessor 1 of the related art . 

The memory 65 has the same hardware configuration 

as the common memory 15 shown in Fig. 5, however, stores 

a user program of a different content . 

Furthermore , the arbiter J^r€r is the same as the 

10 arbiter 16 shown in Fig. 5 in the points that it monitors 

execution states ^ load of proco o- siny eLis. -fr> of the 

processor elements to 5rt A and assigns software 

(cS- 

resources stored in the common memory £r5 to the processor 
elements J5rt, to 51 4 / that is, the hardware resources, 

15 based on the execution states. The arbiter however 
performs processing for establishing various forms of 
synchronization different from that by the arbiter 16 
shown in Fig. 5 and assigns processing of user programs 
to the processor elements 61 x to 61 4 when the above new 

20 instruction codes are executed by the processor elements 
61 x to 61 4 . 

( Instruction Codes ) 

The multiprocessor 51 /adopts the instruction codes 
explained below. 
25 Specifically, the multiprocessor 51 has as 



# * 



- 20 - 

instruction codes: "gen (user program name)" as a program 
execution instruction, "wait(user program name)" as a 
synchronization wait instruction, "cont(user program 
name)" as a wait release instruction, "sleep" as a wait 
5 instruction, and "end" as a program end instruction. Note 
that the multiprocessor ^Sly^urther has the variety of 
instruction codes, for example, provided in general- 
purpose multiprocessors. 

Here, the instruction codes "gen (user program 

10 name)", "wait(user program name)", and "end" are the same 
as the instruction codes having the identical names 
adopted in the multiprocessor 1 shown in Fig. 5, while 
the instruction codes "cont(user program name)" and 
"sleep" are first adopted in the multiprocessor^STj/ 

15 Note that the instruction codes "gen (user program 

name)", "wait (user program name) " , and "cont(user program 
name)" use user program names as arguments. Note that 
there may be any number of arguments. 

The instruction code "gen (user program name)" is 

20 for instructing one of the other processor elements 61 L 

to 61 4 to start executing a user program specified by the 
argument (user program name). 

The instruction code "wait (user program name)" is 
for instructing an element to wait for synchronization 

25 until a user program specified by the argument (user 
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program name) executes an Instruction code "sleep" or 
"end". 

The instruction code "cont(user program name)" is 
for instructing one of the other processor elements 61 L 
5 to 61 4 executing a user program specified by the argument 
(user program name) to release the waiting state when in 
the waiting state. 

The instruction code "sleep" is for instructing 
elements to temporarily stop the execution of a user 
10 program and enter the waiting state. 

Note that the above instruction codes may be 
written in the user program when the programmer prepares 
the user program or may be automatically inserted in 
accordance with need by a compiler. 
15 (Details of Components) 

The processor element 61 1 has, as shown in Fig. 5, 
a processor core 31 serving as a processing means and a 
local memory 32 serving as a second storage means. 

Note that in the present embodiment , a case will be 
20 explained where the processor elements 61 t to 61 4 have 
the same configuration as each other. In the present 
invention, however, the processor elements 61 L to 61 4 are 
not necessarily the same in configuration. For example, 
the execution speed of the processor core 31, memory 
2 5 capacity of the local memory 32, -^fee^ may be different. 



The local memory 32 stores a user program read from 
the common memory 15 via the common bus 17. 

The processor core 31 successively reads and 
executes the instruction codes of the user program stored 
5 in the local memory 32* 

When the instruction code "gen (user program name)" 
is executed, the processor core 31 outputs an execution 
instruction of the user program specified by the argument 
(user program name) to the arbiter via the common bus 17 
10 shown in Fig. 1. 

When the instruction code "wait (user program name)" 
is executed, the processor core 31 enters a 
synchronization waiting state, while when a notice 
indicating that the instruction code "end" or "sleep" of 
15 the user program specified by the argument (user program 
name) is executed is input from the arbiter 66 via the 
common bus 17 shown in Fig. 1, the synchronization 
waiting state is released and the next instruction code 
is executed. 

20 When the processor core 31 executes the instruction 

code "cont(user program name)", it outputs an instruction 
to release the state of waiting for execution of the user 
program specified by the argument (user program name) to 
the arbiter 66 via the common bus 17 and executes the 

25 next instruction code without suspending the processing. 
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When the processor core 31 executes the instruction 
code "sleep", it notifies the arbiter 66 that the 
instruction code is executed via the common bus 17 and 
simultaneously enters a waiting state. When an 
5 instruction to release the waiting state is input from 
the arbiter 66, the processor core 31 releases the 
waiting state and executes the next instruction code. 

Also, when the processor core 31 executes the 
instruction code "end", it notifies the arbiter 66 that 

10 the instruction code is executed via the common bus 17 
and simultaneously ends the execution of the program. 

The common memory 65 stores, for example, user 
programs Prg_a f Prg_b, Prg_c, Prg_d, and Prg_e writing 
various instruction codes including the above "gen (user 

15 program name)", "wait(user program name)", "cont(user 
program name) " , "sleep" , and "end" . 

When an execution instruction of a user program is 
input from one of the processor elements 61 x to 61 4 via 
the common bus 17, the arbiter 66 reads the user program 

20 specified by the execution instruction from the common 

memory 65 to, for example, the local memory 32 of the one 
of the processor elements 61 x to 61 4 having the smallest 
load based on the execution states of the processor 
elements 61 t to 61 4 . 

25 Also, when an instruction to release the waiting 



state of execution of the user program is input, the 
arbiter 66 outputs an instruction indicating to release 
the waiting state to the one of the processor elements 
61 1 to 61 4 executing the user program specified by the 
5 instruction via the common bus 17. 

When a notice indicating that the instruction codes 
"sleep" and "end" were executed is input from one of the 
processor elements 61 1 to 61 4 , the arbiter 66 notifies 
the one of the processor elements 61 x to 61 4 which output 
10 the instruction for executing the program including the 
instruction codes that the notice was input. 

Next, the operation of the multiprocessor 51 will 
be explained while tracing the process of execution of 
user programs in the processor elements 61 L to 61 4 of the 
15 multiprocessor 51 shown in Fig. 1. 

Here, as shown in Fig. 2, a case is illustrated v 
where user programs Prg_a, Prg_b, Prg_c, Prg_d, and Prg_e 
are executed in the processor elements 61 x , 61 2 , 61 3 , and 
61 4 . 

20 Note that the user programs Prg_a, Prg„b, Prg_c, 

Prg_d, and Prg_e are read to the common bus 17 shown in 
Fig. 1 from a computer-readable storage medium such as a 
magnetic disk, magnetic tape, optical disk, or magneto - 
optic disk. 

25 First, the arbiter 66 reads the user program Prg_a 
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shown in Fig. 2 from the common memory 66 into the local 
memory 32 of the processor element 61 x . 

Then, instruction codes written in the user program 
Prg_ a are successively executed in the processor core 31 
5 of the processor element 61 L . 

Specifically, an instruction code " gen ( Pr g_b ) " is 
executed first in the processor core 31 of the processor 
element 61 1# then an execution instruction of the user 
program Prg_b specified by the argument of the 
10 instruction code is output to the arbiter 66 via the 
common bus 17 shown in Fig. 1. 

Then, the arbiter 66 reads the user program Prg_b 
from the common memory 65 into the local memory 32 of the 
processor element 61 2 via the common bus 17. Then, 
15 instruction codes written in the user program Prg_b 

stored in the local memory 32 are successively read and 
executed in the processor element 61 2 . 

Next, the processor core 31 of the processor 
element 61 t successively executes instruction codes 
20 "gen(Prg_c) " and "gen(Prg_d) " . Through similar processing 
as in the above instruction code " gen ( Pr g_a ) " , the 
processor cores 31 of the processor element 61 3 and 61 4 
respectively start to execute the user programs Prg_c and 
Prg_d. 

25 Next, the processor core 31 of the processor 
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element 61 1 executes an instruction code "wait (Prg__d) " 
and enters a synchronization waiting state. 

Then, the processor element 61 4 executes an 
instruction code "sleep" written in the user program 
5 Prg_d and enters a waiting state. 

Also, the arbiter 66 is notified via the common bus 
17 that the instruction code "sleep" was executed in the 
processor element 61 4 . When the notice is input to the 
arbiter 66, the arbiter 66 notifies that the notice was 
10 input to the processor element 61 1 which output the 
instruction to execute the user program Prg_d. 

Then, when the processor element 61 L receives as an 
input the notice from the arbiter 66, the synchronization 
waiting state is released and the next instruction code 
15 is executed in the processor element 61 x . 

Next, the processor core 31 of the processor 
element 61 L executes an instruction code "wait (Prg_c) " 
and enters a synchronization waiting state. 

Then, an instruction code "end" written at the end 
20 of the user program Prg_c is executed in the processor 

element 61 3 , whereupon the execution of the user program 
Prg_c by the processor element 61 3 is ended. 

Also, the arbiter 66 is notified via the common bus 
17 that the instruction code "end" was executed in the 
2 5 processor element 61 3 . 



When -the notice is input to the arbiter 66, the 
arbiter 66 notifies the processor element 61 L which 
output the instruction to execute the user program Prg_c 
that the notice was input . 

When the notice is input to the processor element 
61 x from the arbiter 66, the synchronization waiting 
state is released and the next instruction code is 
executed in the processor element 61 L . 

Note that when the notice indicating the 
instruction code "end" was executed in the processor 
element 61 3 , the arbiter 66 judges that the load on the 
processor element 61 3 is lifted and frees the local 
memory 32 of the processor element 61 3 . 

Next, the processor core 31 of the processor 
element 61 L executes the instruction code " gen ( Pr g_e ) " , 
and, through similar processing as in the case of the 
above instruction code n gen(Prg_a) " , the user program 
Prg_e is read to the local memory 32 of the processor 
element 61 3 which is freed as explained above and the 
processor core 31 of the processor element 61 3 executes 
the user program Prg_e . 

Next, the processor element 61 x executes the 
instruction code "cont (Prg__d) " and outputs an instruction 
to release the waiting state for executing the user 
program Prg_d to the arbiter 66 via the common bus 17. 



- 28 - 

Then, the arbiter 66 outputs via the common bus 17 the 
instruction to release the waiting state to the processor 
element 61 4 executing the user program Prg_d. 

When the processor element 61 4 receives as an input 
5 the instruction from the arbiter 66, the waiting state is 
released and the next instruction code is executed in the 
processor element 61 4 . 

Note that after executing the instruction code 
"cont (Prg_d) n , the processor element 61 x executes the 
10 next instruction code without suspending the processing. 

Next, the processor core 31 of the processor 
element 61 x executes the instruction code "wait (Prg_d) " 
and enters a synchronization waiting state. 

Then, an instruction code "end" written at the end 
15 of the user program Prg_d is executed in the processor 

element 61 4# whereupon the execution of the user program 
Prg_d in the processor element 61 4 is ended. 

Also, the arbiter 66 is notified via the common bus 
17 that the instruction code "end" was executed in the 
20 processor element 61 4 . When the notice is input to the 
arbiter 66, it is notified from the arbiter 66 to the 
processor element 61 x which output the instruction to 
execute the user program Prg_d that the notice was input . 

When the processor element 61 x receives as an input 
25 the notice from the arbiter 66, the synchronization 



waiting state is released and the next instruction code 
is executed in the processor element 61 L . 

Note that when the arbiter 66 receives as an input 
the notice indicating that the instruction code "end" was 
executed in the processor element 61 4 , as explained 
above, the arbiter 66 judges that the load on the 
processor element 61 4 was lifted and frees the local 
memory 32 of the processor element 61 4 . 

Next, the processor core 31 of the processor 
element 61 x executes the instruction code "wait (Prg_e) " 
and enters a synchronization waiting state. 

Then the instruction code "end" written at the end 
of the user program Prg_e is executed in the processor 
element 61 3 , whereupon the execution of the user program 
Prg_e in the processor element 61 3 is ended. 

Also, the arbiter 66 is notified via the common bus 
17 that the instruction code "end" is executed in the 
processor element 61 3 . When the notice is input to the 
arbiter 66, the arbiter notifies the processor element 
61 1 which output the Instruction to execute the user 
program Prg_e that the notice is input. 

When the processor element 6^ receives as an input 
the notice from the arbiter 66, the synchronization 
waiting state is released and the next instruction code 
is executed in the processor element 61J.. 



Note that when the arbiter 66 receives as an input 
the notice indicating that the instruction code "end" is 
executed in the processor element 61 3 , as explained 
above, the arbiter Judges that the load of the processor 
5 element 61 3 is lifted and frees the local memory 32 of 
the processor element 61 3 . 

As explained above, according to the multiprocessor 
51, by using the instruction code "sleep" to instruct a 
waiting state for execution of the program and the 
10 instruction code "cont" to release the waiting state in 

addition to the instruction code "end" to indicate an end 
of the program, it is possible to establish 
synchronization between instruction codes of programs 
being executed in different processor elements 61 x to 
15 6 1 4 - Therefore, according to the multiprocessor 51, it 

can be made possible to perform a variety of processings 
based on programs written to establish synchronization 
between instruction codes. 

Namely, in the same way as with the multiprocessor 
20 1 of the related art, it becomes possible to establish 
synchronization between user programs without an end of 
execution of the user program by an instruction code 
"end" . 

Also, according to the multiprocessor 51, as shown 
25 in Fig. 2, since an instruction code "wait (Prg_d) " is 
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written prior to an instruction code "cont (Prg_d) " in the 
user program Prg_a, it is possible to prevent an 
instruction code "cont (Prg_d) " from being executed prior 
to an execution of an instruction code "sleep" of the 
5 user program Prg_d. As a result, the waiting state of the 
processor element 61 4 due to the instruction code "sleep" 
is reliably released by the execution of the instruction 
code "cont (Prg_d) " in the processor element 61 L . 

Also, according to the multiprocessor 51, when an 

10 instruction code "sleep" is executed, by inputting to the 
arbiter 66 a notice indicating that the instruction code 
"sleep" is executed, it becomes possible for the arbiter 
66 to know that the execution of a user program including 
the instruction code "sleep" is to be resumed in the 

15 future. Therefore, the arbiter 66 can prevent the user 

program from being switched with another user program and 
the number of operations to read the user program is 
reduced, so the processing time can be made shorter. 

Specifically, in the example shown in Fig. 2, it is 

20 sufficient to read only once the user program Prg_d from 
the common memory 65 to the local memory 32 of the 
processor element 61 4 , so the waiting time for the 
processor element 61 4 to read the user program Prg_d can 
be made shorter. Also, when resuming the execution of the 

25 user program Prg_d, this is instantly notified to the 
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processor element 61 4 by the execution of the instruction 
code "cont (Prg„d) " in the processor element 61 x . This is 
effective especially when executing a user program 
requiring real time characteristics in which high speed 
5 response is required. 

Namely, it is possible to prevent needless 
operation of the processor element ll x reading a user 
program Prg_D to the local memory 32 when executing the 
instruction code " gen ( Prg_D ) " for the second time as 
10 explained using Fig. 8. 

The effect of the prevention of needless reading of 
the user program from the common memory 65 to the local 
memory 32 is especially remarkable when the memory 
capacity of the local memory 32 and the size of the user 
15 program to be read are of the same order. 
Second Embodiment 

The multiprocessor of the second embodiment is 
basically the same as the multiprocessor 51 of the first 
embodiment. The point of difference from the 
20 multiprocessor 51, however, is that it uses an 

instruction code "cont_a" as a wait release instruction, 
which will be explained below, instead of using the 
instruction code "cont" of the fist embodiment. 

Namely, when execution of an instruction code 
25 "cont" in one processor element comes later than 
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execution of an instruction code "sleep" corresponding to 
the instruction code "cont" in other processor elements, 
the instruction code "cont" of the above first embodiment 
cannot release the waiting state of the other processor 
5 elements due to the instruction code "sleep". In order to 
prevent such a situation, in the first embodiment, for 
example as shown in Fig. 2, it was necessary to write in 
the user program Prg__a an instruction code "wait (Prg_d) " 
and release the synchronization waiting state by 

10 execution of an instruction code "sleep" of the user 

program Prg_d prior to the writing of an instruction code 
n cont(Prg_d) " . 

In the present embodiment, by using a new 
instruction code "cont_a" instead of the instruction code 

15 "cont" of the first embodiment, the writing of an 
instruction code "wait" prior to that is made 
unnecessary. 

The instruction code "cont__a" designates a user 
program name as an argument in the same way as the 
20 instruction code "cont". Namely, the instruction format 
becomes "cont_a(user program name)". 

Also, when the "cont_a(user program name)" is 
executed, processor cores 31 of the processor elements 
61 1 to 61 4 output instructions to release the waiting 
25 state for executing the user program specified by the 
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argument (user program name) to the arbiter 66 via the 
common bus 17 shown in Fig. 1. 

When the instruction code "cont_ a (user program 
name)" is executed, the processor cores 31 execute the 
5 next instruction codes without suspending the processing. 

Also, the processor elements 61 x to 61 4# after 
receiving as input an inverse synchronization waiting 
instruction from the arbiter 66, enter an inverse 
synchronization waiting state until an instruction to 
10 release the state is input from the arbiter 66. 

When the instruction to release a waiting state for 
execution of a user program is input, the arbiter 66 
judges whether the user program is in the waiting state. 
When it is judged to be in the waiting state, the arbiter 
15 66 outputs to the one of the processor elements 61 x to 
61 4 assigned the user program in the waiting state an 
instruction to release the waiting state via the common 
bus 17. 

On the other hand, when it is judged in the above 
20 judgement that the user program is not in the waiting 

state, the arbiter 66 outputs an inverse synchronization 
waiting instruction to the one of the processor elements 
61 x to 61 4 which output the instruction to release the 
waiting state for the execution of the user program. 
25 Then, when the notice that an instruction code "sleep" 
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was executed Is received as an input, the arbiter 66 
outputs to the one of the processor elements 61 1 to 61 4 
in the above inverse synchronization waiting state an 
instruction indicating to release the state and, at the 
5 same time, outputs to the one of the processor elements 
61 x to 61 4 which executed the instruction code "sleep" 
via the common bus 17 an instruction to release the 
waiting state. 

Below, the operation of the multiprocessor of the 

10 present embodiment will be explained by tracing the 

process of execution of user programs in the processor 
elements 61 L to 61 4 shown in Fig 1. 

Here, as shown in Fig. 3, a case will be explained 
where user programs Prg_aa, Prg_b, Prg_c, Prg_d, and 

15 Prg_e are executed in the processor elements 61 t to 61 4 
shown in Fig. 1. 

The user program Prg_aa shown in Fig. 3 is 
different from the above user program Prg_a in the point 
that an instruction code n cont_a(Prg_d) n is written 

20 instead of the instruction code "cont (Prg_d) " and the 

instruction "wait (Prg_d) " prior to that shown in Fig. 2. 
The user programs Prg__b, Prg_c, Prg_d, and Prg_e are 
respectively the same as the user programs Prg_b, Prg_c, 
Prg_d, and Prg_e shown in Fig. 2. 

25 First, as shown in Fig. 3, the processor element 




- 36 - 



61 x successively executes instruction codes "gen(Prg_b) n , 
n gen ( Prg_c ) " , n gen(Prg_d) " f and "wait (Prg_c) " written in 
the user program Prg_aa, 

The processing is the same as that of the above 
5 instruction codes having identical names explained by- 
using Fig. 2. 

When a synchronization waiting state due to the 
instruction code "wait (Prg_ c) n is released, the 
instruction code n cont_a(Prg_d) " is executed in the 
10 processor core 31 of the processor element 61 L . 

As a result, an instruction to release the waiting 
state for execution of the user program Prg__d is output 
from the processor element 61^ to the arbiter 66 via the 
common bus 17 shown in Fig. 1. 
15 Then, it is Judged in the arbiter 66 whether the 

user program Prg_d being executed in the processor 
element 61 4 is in the waiting state or not. Since it is 
not in the waiting state in this case, an inverse 
synchronization waiting instruction is output to the 
20 processor element 61 x . 

Then the processor element 61 x which received as 
input the inverse synchronization waiting instruction 
enters an inverse synchronization waiting state. 

Then the instruction code "sleep" of the user 
25 program Prg_d is executed in the processor element 61 4 , 
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and a notice indicating that the instruction code "sleep" 
is executed is output from the processor element 61 4 to 
the arbiter 66. Then based on the notice, an instruction 
to release the inverse synchronization waiting state is 
5 output to the processor element 61 x from the arbiter 66 
via the common bus 17 and the state is released in the 
processor element 61 L . 

Next, the processor element 61 x executes 
instruction codes " gen ( Pr g_e ) " , "wait (Prg_d) " , and 

10 "wait (Prg_e) " of the user program Prg_aa. The processing 
of the executions is the same as the case explained above 
by using Fig. 2. 

Note, for example as shown in Fig. 4, when the 
instruction code "sleep" in the processor element 61 4 is 

15 executed at a timing earlier than execution of the 

instruction code n cont_a(Prg_d) " in the processor element 
61 1# the processor element 61 4 enters a waiting state 
after executing the instruction code "sleep". When an 
instruction to release the waiting state based on the 

20 execution of the instruction code "cont_a(Prg_d) " in the 
processor element 61 L is input from the arbiter 66, the 
waiting state is released in the processor element 61 4 . 

As explained above, according to the multiprocessor 
of the present embodiment, by using the above instruction 

25 code "cont_a"; it becomes unnecessary, for example, to 
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write the instruction code "wait" by which the 
synchronization waiting state is released by an execution 
of the instruction code "sleep" prior to the writing of 
the instruction code "cont" in the same way as in the 
first embodiment shown in Fig. 2. Namely, in the case 
shown in Fig. 2, even when the execution of the 
instruction code "sleep" of the user program Prg_d is 
carried out at a timing prior to that of the instruction 
code "cont_a(Prg_d) " , the processor element 61 x enters an 
inverse synchronization waiting state and synchronization 
between the user programs Prg_aa and Prg_d is guaranteed. 

As a result, a programmer can write the user 
program Prg_aa without considering the execution timing 
of the instruction code "sleep" of the user program Prg_d 
so the work load can be reduced. 

The present invention is not limited to the above 
embodiments . 

For example, in the above embodiments, a case was 
explained where instructions to release the waiting 
state, output by the processor elements 61 L to 61 2 in 
accordance with the execution of the instruction codes 
"cont" and "cont_a" , are output to processor elements 61 r 
to 61 4 executing user programs in the waiting state via 
the arbiter 66, however, the processor elements 61 L to 
61 4 executing user programs in the waiting state may 



- 39 - 

monitor the common bus 17, so instructions to release the 
waiting state need not be input via the arbiter 66. 

Also, similarly, a notice indicating that the 
instruction codes "sleep" and "end" output from the 
5 processor elements 61 x to 61 4 may be directly input to 
the corresponding processor elements 61 1 to 61 4 
monitoring the common bus 17, that is, not via the 
arbiter 66. 

Also, in the above embodiments, a case was 
10 explained where the instruction codes "gen", "cont", and 
"cont_a" are written only in the user programs Prg__a and 
Prg_aa. However, the instruction codes may be written in 
the user programs Prg_b, Prg_c, and Prg_d as well. 

Also, the instruction code "sleep" may be written 
15 in a plurality of user programs. 

Further, in the above embodiments, an example of a 
multiprocessor having four processor elements of 
identical configurations was shown. However, any number 
of the processor elements may be used if more than two 
20 and the configuration of the plurality of processor 
elements may be different. 

Also in the above embodiment , a case was explained 
where the components shown in Fig. 1 were provided on the 
same semiconductor chip. The present invention, however, 
2 5 can also be applied to a distributed processing system 
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wherein, for example, the processor elements 61 L to 61 4 
shown in Fig. 1 are provided in different computers 
connected by a network. 

Summarizing the effects of the invention, as 
explained above, according to the parallel processor of 
the present invention, various forms of synchronization 
can be established among programs executed in parallel in 
a plurality of processing means. 

Also, according to the parallel processing method 
and storage medium of the present invention, various 
forms of synchronization can be established among a 
plurality of processings carried out in parallel. 

Also, according to the parallel processor of the 
present invention, by executing a waiting instruction by 
a processing means, it is possible to control the reading 
of a program from a first storage means to a second 
storage means corresponding to the processing means and 
to reduce the number of read operations of the program 
from the first storage means to the second storage means 
and thereby shorten the waiting time of the processing 
means . 

While the invention has been described with 
reference to specific embodiment chosen for purpose of 
illustration, it should be apparent that numerous 
modifications could be made thereto by those skilled in 




the art without departing from the basic concept and 
scope of the invention. 



